Introduction

This report investigates B cell subtypes within pre- and postmenopausal Visium spatial transcriptomics samples. The report focuses on whether double-negative B cell subtypes—also known as atypical memory cells or age-associated B cells—could explain differences in patient outcomes and tumor composition between pre- and postmenopausal patient groups.

Premenopausal patients often present with more aggressive tumors and more severe diagnoses compared to their postmenopausal counterparts. This analysis explores how variations in B cell subtypes may contribute to these differences. Below, you will find a series of analyses suggesting that the double-negative B cell subtype is likely more enriched in premenopausal breast cancer tumors.

Initialization of the data pipeline in python and R

The overall pipeline is implemented in Python and R. Initially, the 10x Visium datasets are imported using the Squidpy and Scanpy packages. Redundant gene names within the dataset were made unique, and mitochondrial genes were identified based on their “MT-” prefix. Quality control (QC) included calculating the percentage of mitochondrial content, and excluding cells with greater than 10% mitochondrial content, also cells with fewer than 600 counts or fewer than 500 genes were filtered out. Genes expressed in fewer than 10 cells were also removed. After filtering, the data were normalized by scaling expression levels such that the overall expression count summed to 10 000, followed by log-transformation. Cell type annotations were obtained using the pre-trained CellTypist model (Immune_All_High), with majority voting to assign the most likely cell type to each cluster. The resulting AnnData objects contained gene expression, clustering, and cell type annotation information for downstream analysis.

Subsequently, the samples were filtered based on binarized images derived from the recommendations of pathologists who outlined the location of the tumors in each slide. Exceptionally, in one sample (Post-06), a portion of the tumor area was discarded due to abnormal morphology, although this did not impact the results. All samples were then converted to Seurat objects and further processed in R. Additional metrics were calculated and incorporated into the Seurat objects for further analysis, including data integration-related metrics with multiple reference datasets and numerous gene signature scores as well as applying an additional normalization (SCTransform).

Subsetting based on the input from the pathologists

As mentioned above, we consulted with pathologists who drew outlines of which area of the tissue we should focus on based on tumor morphology. This potentially reduced some of the variance between samples.

The tumor outlines for each sample to generate a black-and-white image mask in ImageJ (panel A). These binary masks were then converted into a binarized vector in Python. For each sample, the spatial coordinates of sequencing locations were scaled and rounded to match the dimensions of the high-resolution image used to generate the mask. Any spatial coordinates that corresponded to a mask value of 0 were discarded. In the panel below: the spatial coordinates before filtering (panel C) and after filtering (panel D) are displayed. Additionally, panel B shows the assigned cell type identities for each spot (after spatial filtering).

Sample quality metrics

Below, several quality metrics are displayed, including the number of unique features (left), mitochondrial features (center), and the number of unique molecular identifiers mapping to the transcriptome for each of the datasets (right).

Corroborating the accuracy of the CellTypist-assigned B cell identities

To test the accuracy of the CellTypist-based cell type assignment, the cosine similarity scores were examined for several different cell types (including B cells) using an average expression vector derived from each cell type group, taken from a reference dataset (Wu et al., 2021). As you can see, the cosine similarity scores for the B cells (panel A) largely align with the B cell type assignments from CellTypist (panel B). This comparison served two purposes. First, it enabled a sanity check for the CellTypist assignment. Additionally, the cosine similarity scores can more easily be correlated with other metrics, such as gene signature scores, to determine how cell type identity and gene signature scores may correlate.